HTTP
Import flat files from the web
The utils functions to import flat file data, such as read.csv() and read.delim(), are capable of automatically importing from URLs that point to flat files on the web.
You must be wondering whether Hadley Wickham’s alternative package, readr, is equally potent. Well, figure it out in this exercise! The URLs for both a .csv file as well as a .delim file are already coded for you. It’s up to you to actually import the data. If it works, that is…
# Load the readr package
library(readr)
# Import the csv file: pools
url_csv <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/swimming_pools.csv"
pools <- read_csv(url_csv)## Parsed with column specification:
## cols(
## Name = col_character(),
## Address = col_character(),
## Latitude = col_double(),
## Longitude = col_double()
## )
# Import the txt file: potatoes
url_delim <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/potatoes.txt"
potatoes <- read_tsv(url_delim)## Parsed with column specification:
## cols(
## area = col_double(),
## temp = col_double(),
## size = col_double(),
## storage = col_double(),
## method = col_double(),
## texture = col_double(),
## flavor = col_double(),
## moistness = col_double()
## )
# Print pools and potatoes
pools %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Name | Address | Latitude | Longitude |
|---|---|---|---|
| Acacia Ridge Leisure Centre | 1391 Beaudesert Road, Acacia Ridge | -27.58616 | 153.0264 |
| Bellbowrie Pool | Sugarwood Street, Bellbowrie | -27.56547 | 152.8911 |
| Carole Park | Cnr Boundary Road and Waterford Road Wacol | -27.60744 | 152.9315 |
| Centenary Pool (inner City) | 400 Gregory Terrace, Spring Hill | -27.45537 | 153.0251 |
| Chermside Pool | 375 Hamilton Road, Chermside | -27.38583 | 153.0351 |
| Colmslie Pool (Morningside) | 400 Lytton Road, Morningside | -27.45516 | 153.0789 |
potatoes %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| area | temp | size | storage | method | texture | flavor | moistness |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 2.9 | 3.2 | 3.0 |
| 1 | 1 | 1 | 1 | 2 | 2.3 | 2.5 | 2.6 |
| 1 | 1 | 1 | 1 | 3 | 2.5 | 2.8 | 2.8 |
| 1 | 1 | 1 | 1 | 4 | 2.1 | 2.9 | 2.4 |
| 1 | 1 | 1 | 1 | 5 | 1.9 | 2.8 | 2.2 |
| 1 | 1 | 1 | 2 | 1 | 1.8 | 3.0 | 1.7 |
Secure importing
In the previous exercises, you have been working with URLs that all start with http://. There is, however, a safer alternative to HTTP, namely HTTPS, which stands for HypterText Transfer Protocol Secure. Just remember this: HTTPS is relatively safe, HTTP is not.
Luckily for us, you can use the standard importing functions with https:// connections since R version 3.2.2.
# https URL to the swimming_pools csv file.
url_csv <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/swimming_pools.csv"
# Import the file using read.csv(): pools1
pools1 <- read.csv(url_csv)
# Load the readr package
library(readr)
# Import the file using read_csv(): pools2
pools2 <- read_csv(url_csv)## Parsed with column specification:
## cols(
## Name = col_character(),
## Address = col_character(),
## Latitude = col_double(),
## Longitude = col_double()
## )
# Print the structure of pools1 and pools2
str(pools1)## 'data.frame': 20 obs. of 4 variables:
## $ Name : Factor w/ 20 levels "Acacia Ridge Leisure Centre",..: 1 2 3 4 5 6 19 7 8 9 ...
## $ Address : Factor w/ 20 levels "1 Fairlead Crescent, Manly",..: 5 20 18 10 9 11 6 15 12 17 ...
## $ Latitude : num -27.6 -27.6 -27.6 -27.5 -27.4 ...
## $ Longitude: num 153 153 153 153 153 ...
str(pools2)## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 20 obs. of 4 variables:
## $ Name : chr "Acacia Ridge Leisure Centre" "Bellbowrie Pool" "Carole Park" "Centenary Pool (inner City)" ...
## $ Address : chr "1391 Beaudesert Road, Acacia Ridge" "Sugarwood Street, Bellbowrie" "Cnr Boundary Road and Waterford Road Wacol" "400 Gregory Terrace, Spring Hill" ...
## $ Latitude : num -27.6 -27.6 -27.6 -27.5 -27.4 ...
## $ Longitude: num 153 153 153 153 153 ...
## - attr(*, "spec")=
## .. cols(
## .. Name = col_character(),
## .. Address = col_character(),
## .. Latitude = col_double(),
## .. Longitude = col_double()
## .. )
Import Excel files from the web
When you learned about gdata, it was already mentioned that gdata can handle .xls files that are on the internet. readxl can’t, at least not yet. The URL with which you’ll be working is already available in the sample code. You will import it once using gdata and once with the readxl package via a workaround.
# Load gdata package - Does not owrk as Perl Strawberry not installed
library(gdata)
# Specification of url: url_xls
url_xls <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls"
# Import the .xls file with gdata: excel_gdata
excel_gdata <- read.xls(url_xls, perl = "C:/myperl/perl/bin/perl.exe")
excel_gdata %>%
kable(caption = "Latitude from read.xls") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
scroll_box(width = "100%", height = "300px")| country | X1700 |
|---|---|
| Afghanistan | 34.5650000 |
| Akrotiri and Dhekelia | 34.6166667 |
| Albania | 41.3120000 |
| Algeria | 36.7200000 |
| American Samoa | -14.3070000 |
| Andorra | 42.5460000 |
| Angola | -8.8430000 |
| Anguilla | 18.2500000 |
| Antigua and Barbuda | 17.0720000 |
| Argentina | -36.6760000 |
| Armenia | 40.2540000 |
| Aruba | 12.5130000 |
| Australia | -32.2190000 |
| Austria | 48.2310000 |
| Azerbaijan | 40.3520000 |
| Bahamas | 24.7000000 |
| Bahrain | 26.0240000 |
| Bangladesh | 23.8800000 |
| Barbados | 13.1790000 |
| Belarus | 53.5470900 |
| Belgium | 50.8370000 |
| Belize | 17.8430000 |
| Benin | 6.3640000 |
| Bermuda | 32.2170000 |
| Bhutan | 27.4790000 |
| Bolivia | -15.1900000 |
| Bosnia and Herzegovina | 44.1750100 |
| Botswana | -21.5360000 |
| Brazil | -19.5570000 |
| British Virgin Islands | 18.5000000 |
| Brunei | 4.5010000 |
| Bulgaria | 42.0730000 |
| Burkina Faso | 12.0490000 |
| Burundi | -3.3650000 |
| Cambodia | 12.0260000 |
| Cameroon | 10.7300000 |
| Canada | 43.7270000 |
| Cape Verde | 15.0910000 |
| Cayman Islands | 19.3190000 |
| Central African Rep. | 4.3310000 |
| Chad | 10.3770000 |
| Channel Islands | 49.2170000 |
| Chile | -33.5540000 |
| China | 29.5610000 |
| Christmas Island | -10.5000000 |
| Cocos Island | -12.5000000 |
| Colombia | 4.7880000 |
| Comoros | -11.6710000 |
| Congo, Dem. Rep. | -2.9202760 |
| Congo, Rep. | -3.6840000 |
| Cook Islands | -21.2333333 |
| Costa Rica | 9.9410000 |
| Cote d’Ivoire | 5.4960000 |
| Croatia | 45.1100800 |
| Cuba | 23.0840000 |
| Cyprus | 35.0810000 |
| Czech Rep. | 49.7792900 |
| Denmark | 55.7180000 |
| Djibouti | 11.5050000 |
| Dominica | 15.4330000 |
| Dominican Rep. | 18.5610000 |
| East Germany | NA |
| Ecuador | -2.0620000 |
| Egypt | 29.9960000 |
| El Salvador | 13.7750000 |
| Equatorial Guinea | 2.3260000 |
| Eritrea | 15.3120000 |
| Estonia | 58.6850000 |
| Ethiopia | 9.0070000 |
| Faeroe Islands | 62.0000000 |
| Falkland Islands (Malvinas) | -51.7500000 |
| Fiji | -17.8270000 |
| Finland | 60.2120000 |
| France | 48.8570000 |
| French Guiana | 3.9880000 |
| French Polynesia | -17.6660000 |
| Gabon | 0.3720000 |
| Gambia | 13.2570000 |
| Georgia | 42.0340000 |
| Germany | 48.1610000 |
| Ghana | 6.6940000 |
| Gibraltar | 36.1333333 |
| Greece | 38.0580000 |
| Greenland | 64.2170000 |
| Grenada | 12.1150000 |
| Guadeloupe | 16.1630000 |
| Guam | 13.4400000 |
| Guatemala | 14.6220000 |
| Guernsey | 49.4666667 |
| Guinea | 11.6710000 |
| Guinea-Bissau | 12.2620000 |
| Guyana | 5.7610000 |
| Haiti | 18.9320000 |
| Holy See | 41.9000000 |
| Honduras | 14.1940000 |
| Hong Kong, China | 22.7040000 |
| Hungary | 47.4190000 |
| Iceland | 63.8920000 |
| India | 25.2740000 |
| Indonesia | -6.5620000 |
| Iran | 35.3800000 |
| Iraq | 33.3140000 |
| Ireland | 54.6110000 |
| Isle of Man | 54.2250000 |
| Israel | 32.0840000 |
| Italy | 45.4150000 |
| Jamaica | 18.0550000 |
| Japan | 35.7120000 |
| Jersey | 49.2500000 |
| Jordan | 31.6020000 |
| Kazakhstan | 44.3080000 |
| Kenya | -0.5130000 |
| Kiribati | 1.8470000 |
| Korea, Dem. Rep. | 39.5270000 |
| Korea, Rep. | 37.5530000 |
| Kosovo | 42.5833333 |
| Kuwait | 29.3260000 |
| Kyrgyzstan | 41.4894000 |
| Laos | 16.5470000 |
| Latvia | 56.8580000 |
| Lebanon | 34.1110000 |
| Lesotho | -29.5950000 |
| Liberia | 6.3850000 |
| Libya | 32.6070000 |
| Liechtenstein | 47.1410000 |
| Lithuania | 55.3180000 |
| Luxembourg | 49.7800000 |
| Macao, China | 22.5060000 |
| Macedonia, FYR | 41.5738100 |
| Madagascar | -18.9580000 |
| Malawi | -15.8110000 |
| Malaysia | 3.2690000 |
| Maldives | 3.2500000 |
| Mali | 12.5080000 |
| Malta | 35.8870000 |
| Marshall Islands | 9.0000000 |
| Martinique | 14.6540000 |
| Mauritania | 17.9250000 |
| Mauritius | -20.2320000 |
| Mayotte | -12.8333333 |
| Mexico | 16.7590000 |
| Micronesia, Fed. Sts. | 7.3570000 |
| Moldova | 47.1660000 |
| Monaco | 43.7460000 |
| Mongolia | 47.4930000 |
| Montenegro | 42.7889900 |
| Montserrat | 16.7500000 |
| Morocco | 33.5930000 |
| Mozambique | -18.4990000 |
| Myanmar | 17.6790000 |
| Namibia | -17.9790000 |
| Nauru | -0.5333333 |
| Nepal | 27.7120000 |
| Netherlands | 51.8740000 |
| Netherlands Antilles | 12.1920000 |
| New Caledonia | -21.3290000 |
| New Zealand | -36.8920000 |
| Nicaragua | 12.2110000 |
| Niger | 13.8760000 |
| Nigeria | 6.5430000 |
| Niue | -19.0333333 |
| Norfolk Island | -29.0333333 |
| Northern Mariana Islands | 15.1780000 |
| Norway | 59.9770000 |
| Oman | 20.4450000 |
| Pakistan | 31.1730000 |
| Palau | 7.5000000 |
| Panama | 9.2060000 |
| Papua New Guinea | -6.6000000 |
| Paraguay | -25.5830000 |
| Peru | -11.7940000 |
| Philippines | 13.9220000 |
| Pitcairn | -25.0666667 |
| Poland | 50.2440000 |
| Portugal | 38.8160000 |
| Puerto Rico | 18.2250000 |
| Qatar | 25.3090000 |
| Reunion | -20.9540000 |
| Romania | 44.5260000 |
| Russia | 55.6750000 |
| Rwanda | -2.0320000 |
| Saint Barthélemy | 17.9000000 |
| Saint Helena | -15.9500000 |
| Saint Kitts and Nevis | 17.3270000 |
| Saint Lucia | 13.8980000 |
| Saint Martin | 18.0833333 |
| Saint Vincent and the Grenadines | 13.2540000 |
| Saint-Pierre-et-Miquelon | 46.8333333 |
| Samoa | -13.6330000 |
| San Marino | 43.7666667 |
| Sao Tome and Principe | 1.0000000 |
| Saudi Arabia | 23.0690000 |
| Senegal | 14.7720000 |
| Serbia | 44.0467300 |
| Seychelles | -4.6640000 |
| Sierra Leone | 8.7010000 |
| Singapore | 1.3550000 |
| Slovak Republic | 48.7853800 |
| Slovenia | 46.1279000 |
| Solomon Islands | -9.6250000 |
| Somalia | 10.6330000 |
| South Africa | -29.1300000 |
| Spain | 37.3980000 |
| Sri Lanka | 6.8680000 |
| Sudan | 14.0430000 |
| Suriname | 5.6050000 |
| Svalbard | 78.0000000 |
| Swaziland | -26.5450000 |
| Sweden | 59.2780000 |
| Switzerland | 47.4080000 |
| Syria | 33.4580000 |
| Taiwan | 23.6448500 |
| Tajikistan | 37.8060000 |
| Tanzania | -2.1540000 |
| Thailand | 13.7700000 |
| Timor-Leste | -8.8333333 |
| Togo | 6.1940000 |
| Tokelau | -9.0000000 |
| Tonga | -21.1730000 |
| Trinidad and Tobago | 10.4180000 |
| Tunisia | 36.8160000 |
| Turkey | 41.2020000 |
| Turkmenistan | 39.1293800 |
| Turks and Caicos Islands | 21.7500000 |
| Tuvalu | -8.0000000 |
| Uganda | 0.2280000 |
| Ukraine | 50.2810000 |
| United Arab Emirates | 23.3900000 |
| United Kingdom | 51.5100000 |
| United States | 34.3600000 |
| Uruguay | -34.8220000 |
| USSR | NA |
| Uzbekistan | 41.2720000 |
| Wallis et Futuna | -13.3000000 |
| Vanuatu | -15.2330000 |
| Venezuela | 9.8430000 |
| West Bank and Gaza | 31.4166667 |
| West Germany | NA |
| Western Sahara | 24.6191300 |
| Vietnam | 10.7980000 |
| Virgin Islands (U.S.) | 17.7360000 |
| Yemen, Rep. | 15.2280000 |
| Yugoslavia | NA |
| Zambia | -12.9420000 |
| Zimbabwe | -17.8760000 |
| Åland | 60.0000000 |
# Load the readxl and gdata package
library(readxl)
# Specification of url: url_xls
url_xls <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/latitude.xls"
# Download file behind URL, name it local_latitude.xls
#download.file(url_xls, destfile = "local_latitude.xls")
# Import the local .xls file with readxl: excel_readxl
excel_readxl <- read_excel("local_latitude.xls")
excel_readxl %>%
kable(caption = "Latitude from read_excel") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
scroll_box(width = "100%", height = "300px")| country | 1700 |
|---|---|
| Afghanistan | 34.5650000 |
| Akrotiri and Dhekelia | 34.6166667 |
| Albania | 41.3120000 |
| Algeria | 36.7200000 |
| American Samoa | -14.3070000 |
| Andorra | 42.5460000 |
| Angola | -8.8430000 |
| Anguilla | 18.2500000 |
| Antigua and Barbuda | 17.0720000 |
| Argentina | -36.6760000 |
| Armenia | 40.2540000 |
| Aruba | 12.5130000 |
| Australia | -32.2190000 |
| Austria | 48.2310000 |
| Azerbaijan | 40.3520000 |
| Bahamas | 24.7000000 |
| Bahrain | 26.0240000 |
| Bangladesh | 23.8800000 |
| Barbados | 13.1790000 |
| Belarus | 53.5470900 |
| Belgium | 50.8370000 |
| Belize | 17.8430000 |
| Benin | 6.3640000 |
| Bermuda | 32.2170000 |
| Bhutan | 27.4790000 |
| Bolivia | -15.1900000 |
| Bosnia and Herzegovina | 44.1750100 |
| Botswana | -21.5360000 |
| Brazil | -19.5570000 |
| British Virgin Islands | 18.5000000 |
| Brunei | 4.5010000 |
| Bulgaria | 42.0730000 |
| Burkina Faso | 12.0490000 |
| Burundi | -3.3650000 |
| Cambodia | 12.0260000 |
| Cameroon | 10.7300000 |
| Canada | 43.7270000 |
| Cape Verde | 15.0910000 |
| Cayman Islands | 19.3190000 |
| Central African Rep. | 4.3310000 |
| Chad | 10.3770000 |
| Channel Islands | 49.2170000 |
| Chile | -33.5540000 |
| China | 29.5610000 |
| Christmas Island | -10.5000000 |
| Cocos Island | -12.5000000 |
| Colombia | 4.7880000 |
| Comoros | -11.6710000 |
| Congo, Dem. Rep. | -2.9202760 |
| Congo, Rep. | -3.6840000 |
| Cook Islands | -21.2333333 |
| Costa Rica | 9.9410000 |
| Cote d’Ivoire | 5.4960000 |
| Croatia | 45.1100800 |
| Cuba | 23.0840000 |
| Cyprus | 35.0810000 |
| Czech Rep. | 49.7792900 |
| Denmark | 55.7180000 |
| Djibouti | 11.5050000 |
| Dominica | 15.4330000 |
| Dominican Rep. | 18.5610000 |
| East Germany | NA |
| Ecuador | -2.0620000 |
| Egypt | 29.9960000 |
| El Salvador | 13.7750000 |
| Equatorial Guinea | 2.3260000 |
| Eritrea | 15.3120000 |
| Estonia | 58.6850000 |
| Ethiopia | 9.0070000 |
| Faeroe Islands | 62.0000000 |
| Falkland Islands (Malvinas) | -51.7500000 |
| Fiji | -17.8270000 |
| Finland | 60.2120000 |
| France | 48.8570000 |
| French Guiana | 3.9880000 |
| French Polynesia | -17.6660000 |
| Gabon | 0.3720000 |
| Gambia | 13.2570000 |
| Georgia | 42.0340000 |
| Germany | 48.1610000 |
| Ghana | 6.6940000 |
| Gibraltar | 36.1333333 |
| Greece | 38.0580000 |
| Greenland | 64.2170000 |
| Grenada | 12.1150000 |
| Guadeloupe | 16.1630000 |
| Guam | 13.4400000 |
| Guatemala | 14.6220000 |
| Guernsey | 49.4666667 |
| Guinea | 11.6710000 |
| Guinea-Bissau | 12.2620000 |
| Guyana | 5.7610000 |
| Haiti | 18.9320000 |
| Holy See | 41.9000000 |
| Honduras | 14.1940000 |
| Hong Kong, China | 22.7040000 |
| Hungary | 47.4190000 |
| Iceland | 63.8920000 |
| India | 25.2740000 |
| Indonesia | -6.5620000 |
| Iran | 35.3800000 |
| Iraq | 33.3140000 |
| Ireland | 54.6110000 |
| Isle of Man | 54.2250000 |
| Israel | 32.0840000 |
| Italy | 45.4150000 |
| Jamaica | 18.0550000 |
| Japan | 35.7120000 |
| Jersey | 49.2500000 |
| Jordan | 31.6020000 |
| Kazakhstan | 44.3080000 |
| Kenya | -0.5130000 |
| Kiribati | 1.8470000 |
| Korea, Dem. Rep. | 39.5270000 |
| Korea, Rep. | 37.5530000 |
| Kosovo | 42.5833333 |
| Kuwait | 29.3260000 |
| Kyrgyzstan | 41.4894000 |
| Laos | 16.5470000 |
| Latvia | 56.8580000 |
| Lebanon | 34.1110000 |
| Lesotho | -29.5950000 |
| Liberia | 6.3850000 |
| Libya | 32.6070000 |
| Liechtenstein | 47.1410000 |
| Lithuania | 55.3180000 |
| Luxembourg | 49.7800000 |
| Macao, China | 22.5060000 |
| Macedonia, FYR | 41.5738100 |
| Madagascar | -18.9580000 |
| Malawi | -15.8110000 |
| Malaysia | 3.2690000 |
| Maldives | 3.2500000 |
| Mali | 12.5080000 |
| Malta | 35.8870000 |
| Marshall Islands | 9.0000000 |
| Martinique | 14.6540000 |
| Mauritania | 17.9250000 |
| Mauritius | -20.2320000 |
| Mayotte | -12.8333333 |
| Mexico | 16.7590000 |
| Micronesia, Fed. Sts. | 7.3570000 |
| Moldova | 47.1660000 |
| Monaco | 43.7460000 |
| Mongolia | 47.4930000 |
| Montenegro | 42.7889900 |
| Montserrat | 16.7500000 |
| Morocco | 33.5930000 |
| Mozambique | -18.4990000 |
| Myanmar | 17.6790000 |
| Namibia | -17.9790000 |
| Nauru | -0.5333333 |
| Nepal | 27.7120000 |
| Netherlands | 51.8740000 |
| Netherlands Antilles | 12.1920000 |
| New Caledonia | -21.3290000 |
| New Zealand | -36.8920000 |
| Nicaragua | 12.2110000 |
| Niger | 13.8760000 |
| Nigeria | 6.5430000 |
| Niue | -19.0333333 |
| Norfolk Island | -29.0333333 |
| Northern Mariana Islands | 15.1780000 |
| Norway | 59.9770000 |
| Oman | 20.4450000 |
| Pakistan | 31.1730000 |
| Palau | 7.5000000 |
| Panama | 9.2060000 |
| Papua New Guinea | -6.6000000 |
| Paraguay | -25.5830000 |
| Peru | -11.7940000 |
| Philippines | 13.9220000 |
| Pitcairn | -25.0666667 |
| Poland | 50.2440000 |
| Portugal | 38.8160000 |
| Puerto Rico | 18.2250000 |
| Qatar | 25.3090000 |
| Reunion | -20.9540000 |
| Romania | 44.5260000 |
| Russia | 55.6750000 |
| Rwanda | -2.0320000 |
| Saint Barthélemy | 17.9000000 |
| Saint Helena | -15.9500000 |
| Saint Kitts and Nevis | 17.3270000 |
| Saint Lucia | 13.8980000 |
| Saint Martin | 18.0833333 |
| Saint Vincent and the Grenadines | 13.2540000 |
| Saint-Pierre-et-Miquelon | 46.8333333 |
| Samoa | -13.6330000 |
| San Marino | 43.7666667 |
| Sao Tome and Principe | 1.0000000 |
| Saudi Arabia | 23.0690000 |
| Senegal | 14.7720000 |
| Serbia | 44.0467300 |
| Seychelles | -4.6640000 |
| Sierra Leone | 8.7010000 |
| Singapore | 1.3550000 |
| Slovak Republic | 48.7853800 |
| Slovenia | 46.1279000 |
| Solomon Islands | -9.6250000 |
| Somalia | 10.6330000 |
| South Africa | -29.1300000 |
| Spain | 37.3980000 |
| Sri Lanka | 6.8680000 |
| Sudan | 14.0430000 |
| Suriname | 5.6050000 |
| Svalbard | 78.0000000 |
| Swaziland | -26.5450000 |
| Sweden | 59.2780000 |
| Switzerland | 47.4080000 |
| Syria | 33.4580000 |
| Taiwan | 23.6448500 |
| Tajikistan | 37.8060000 |
| Tanzania | -2.1540000 |
| Thailand | 13.7700000 |
| Timor-Leste | -8.8333333 |
| Togo | 6.1940000 |
| Tokelau | -9.0000000 |
| Tonga | -21.1730000 |
| Trinidad and Tobago | 10.4180000 |
| Tunisia | 36.8160000 |
| Turkey | 41.2020000 |
| Turkmenistan | 39.1293800 |
| Turks and Caicos Islands | 21.7500000 |
| Tuvalu | -8.0000000 |
| Uganda | 0.2280000 |
| Ukraine | 50.2810000 |
| United Arab Emirates | 23.3900000 |
| United Kingdom | 51.5100000 |
| United States | 34.3600000 |
| Uruguay | -34.8220000 |
| USSR | NA |
| Uzbekistan | 41.2720000 |
| Wallis et Futuna | -13.3000000 |
| Vanuatu | -15.2330000 |
| Venezuela | 9.8430000 |
| West Bank and Gaza | 31.4166667 |
| West Germany | NA |
| Western Sahara | 24.6191300 |
| Vietnam | 10.7980000 |
| Virgin Islands (U.S.) | 17.7360000 |
| Yemen, Rep. | 15.2280000 |
| Yugoslavia | NA |
| Zambia | -12.9420000 |
| Zimbabwe | -17.8760000 |
| Åland | 60.0000000 |
It appears that readxl is not (yet?) able to deal with Excel files that are on the web. However, a simply workaround with download.file() fixes this.
Downloading any file, secure or not
In the previous exercise you’ve seen how you can read excel files on the web using the read_excel package by first downloading the file with the download.file() function.
There’s more: with download.file() you can download any kind of file from the web, using HTTP and HTTPS: images, executable files, but also .RData files. An RData file is very efficient format to store R data.
You can load data from an RData file using the load() function, but this function does not accept a URL string as an argument. In this exercise, you’ll first download the RData file securely, and then import the local data file.
# https URL to the wine RData file.
url_rdata <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1478/datasets/wine.RData"
# Download the wine file to your working directory
download.file(url_rdata, destfile = "wine_local.RData")
# Load the wine data into your workspace using load()
load("wine_local.RData")
# Print out the summary of the wine data
sum_wine <- data.frame(summary(wine))
x <- c("name", "Variable", "Statistics")
colnames(sum_wine) <- x
sum_wine[, 2:3] %>%
kable(caption = "Summary statistics of Wine data frame") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
scroll_box(width = "100%", height = "300px")| Variable | Statistics |
|---|---|
| Alcohol | Min. :11.03 |
| Alcohol | 1st Qu.:12.36 |
| Alcohol | Median :13.05 |
| Alcohol | Mean :12.99 |
| Alcohol | 3rd Qu.:13.67 |
| Alcohol | Max. :14.83 |
| Malic acid | Min. :0.74 |
| Malic acid | 1st Qu.:1.60 |
| Malic acid | Median :1.87 |
| Malic acid | Mean :2.34 |
| Malic acid | 3rd Qu.:3.10 |
| Malic acid | Max. :5.80 |
| Ash | Min. :1.360 |
| Ash | 1st Qu.:2.210 |
| Ash | Median :2.360 |
| Ash | Mean :2.366 |
| Ash | 3rd Qu.:2.560 |
| Ash | Max. :3.230 |
| Alcalinity of ash | Min. :10.60 |
| Alcalinity of ash | 1st Qu.:17.20 |
| Alcalinity of ash | Median :19.50 |
| Alcalinity of ash | Mean :19.52 |
| Alcalinity of ash | 3rd Qu.:21.50 |
| Alcalinity of ash | Max. :30.00 |
| Magnesium | Min. : 70.00 |
| Magnesium | 1st Qu.: 88.00 |
| Magnesium | Median : 98.00 |
| Magnesium | Mean : 99.59 |
| Magnesium | 3rd Qu.:107.00 |
| Magnesium | Max. :162.00 |
| Total phenols | Min. :0.980 |
| Total phenols | 1st Qu.:1.740 |
| Total phenols | Median :2.350 |
| Total phenols | Mean :2.292 |
| Total phenols | 3rd Qu.:2.800 |
| Total phenols | Max. :3.880 |
| Flavanoids | Min. :0.340 |
| Flavanoids | 1st Qu.:1.200 |
| Flavanoids | Median :2.130 |
| Flavanoids | Mean :2.023 |
| Flavanoids | 3rd Qu.:2.860 |
| Flavanoids | Max. :5.080 |
| Nonflavanoid phenols | Min. :0.1300 |
| Nonflavanoid phenols | 1st Qu.:0.2700 |
| Nonflavanoid phenols | Median :0.3400 |
| Nonflavanoid phenols | Mean :0.3623 |
| Nonflavanoid phenols | 3rd Qu.:0.4400 |
| Nonflavanoid phenols | Max. :0.6600 |
| Proanthocyanins | Min. :0.410 |
| Proanthocyanins | 1st Qu.:1.250 |
| Proanthocyanins | Median :1.550 |
| Proanthocyanins | Mean :1.587 |
| Proanthocyanins | 3rd Qu.:1.950 |
| Proanthocyanins | Max. :3.580 |
| Color intensity | Min. : 1.280 |
| Color intensity | 1st Qu.: 3.210 |
| Color intensity | Median : 4.680 |
| Color intensity | Mean : 5.055 |
| Color intensity | 3rd Qu.: 6.200 |
| Color intensity | Max. :13.000 |
| Hue | Min. :1.270 |
| Hue | 1st Qu.:1.930 |
| Hue | Median :2.780 |
| Hue | Mean :2.604 |
| Hue | 3rd Qu.:3.170 |
| Hue | Max. :4.000 |
| Proline | Min. : 278.0 |
| Proline | 1st Qu.: 500.0 |
| Proline | Median : 672.0 |
| Proline | Mean : 745.1 |
| Proline | 3rd Qu.: 985.0 |
| Proline | Max. :1680.0 |
Another way to load remote RData files is to use the url() function inside load(). However, this will not save the RData file to a local file.
Remember that you can’t directly use a URL string inside load() to load remote RData files. You should use url() or download the file first using download.file().
HTTP? httr! (1)
Downloading a file from the Internet means sending a GET request and receiving the file you asked for. Internally, all the previously discussed functions use a GET request to download files.
httr provides a convenient function, GET() to execute this GET request. The result is a response object, that provides easy access to the status code, content-type and, of course, the actual content.
You can extract the content from the request using the content() function. At the time of writing, there are three ways to retrieve this content: as a raw object, as a character vector, or an R object, such as a list. If you don’t tell content() how to retrieve the content through the as argument, it’ll try its best to figure out which type is most appropriate based on the content-type.
# Load the httr package
library(httr)
# Get the url, save response to resp
url <- "http://www.example.com/"
resp <- GET(url)
# Print resp
resp## Response [http://www.example.com/]
## Date: 2019-02-18 13:39
## Status: 200
## Content-Type: text/html; charset=UTF-8
## Size: 1.27 kB
## <!doctype html>
## <html>
## <head>
## <title>Example Domain</title>
##
## <meta charset="utf-8" />
## <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
## <meta name="viewport" content="width=device-width, initial-scale=1" />
## <style type="text/css">
## body {
## ...
# Get the raw content of resp: raw_content
raw_content <- content(resp, as = "raw")
# Print the head of raw_content
head(raw_content)## [1] 3c 21 64 6f 63 74
The raw content of the response doesn’t make a lot of sense, does it? Luckily, the content() function by default, if you don’t specify the as argument, figures out what type of data you’re dealing with and parses it for you.
HTTP? httr! (2)
Web content does not limit itself to HTML pages and files stored on remote servers such as DataCamp’s Amazon S3 instances. There are many other data formats out there. A very common one is JSON. This format is very often used by so-called Web APIs, interfaces to web servers with which you as a client can communicate to get or store information in more complicated ways.
You’ll learn about Web APIs and JSON in the exercises that follow, but some experimentation never hurts, does it?
# Get the url
url <- "http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json"
resp <- GET(url)
# Print resp
print(resp)## Response [http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json]
## Date: 2019-02-18 13:39
## Status: 200
## Content-Type: application/json; charset=utf-8
## Size: 929 B
# Print content of resp as text
print(content(resp, as = "text"))## [1] "{\"Title\":\"Annie Hall\",\"Year\":\"1977\",\"Rated\":\"PG\",\"Released\":\"20 Apr 1977\",\"Runtime\":\"93 min\",\"Genre\":\"Comedy, Romance\",\"Director\":\"Woody Allen\",\"Writer\":\"Woody Allen, Marshall Brickman\",\"Actors\":\"Woody Allen, Diane Keaton, Tony Roberts, Carol Kane\",\"Plot\":\"Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall.\",\"Language\":\"English, German\",\"Country\":\"USA\",\"Awards\":\"Won 4 Oscars. Another 26 wins & 8 nominations.\",\"Poster\":\"https://m.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg\",\"Ratings\":[{\"Source\":\"Internet Movie Database\",\"Value\":\"8.0/10\"},{\"Source\":\"Rotten Tomatoes\",\"Value\":\"97%\"},{\"Source\":\"Metacritic\",\"Value\":\"92/100\"}],\"Metascore\":\"92\",\"imdbRating\":\"8.0\",\"imdbVotes\":\"233,030\",\"imdbID\":\"tt0075686\",\"Type\":\"movie\",\"DVD\":\"28 Apr 1998\",\"BoxOffice\":\"N/A\",\"Production\":\"United Artists\",\"Website\":\"N/A\",\"Response\":\"True\"}"
# Print content of resp
print(content(resp))## $Title
## [1] "Annie Hall"
##
## $Year
## [1] "1977"
##
## $Rated
## [1] "PG"
##
## $Released
## [1] "20 Apr 1977"
##
## $Runtime
## [1] "93 min"
##
## $Genre
## [1] "Comedy, Romance"
##
## $Director
## [1] "Woody Allen"
##
## $Writer
## [1] "Woody Allen, Marshall Brickman"
##
## $Actors
## [1] "Woody Allen, Diane Keaton, Tony Roberts, Carol Kane"
##
## $Plot
## [1] "Neurotic New York comedian Alvy Singer falls in love with the ditzy Annie Hall."
##
## $Language
## [1] "English, German"
##
## $Country
## [1] "USA"
##
## $Awards
## [1] "Won 4 Oscars. Another 26 wins & 8 nominations."
##
## $Poster
## [1] "https://m.media-amazon.com/images/M/MV5BZDg1OGQ4YzgtM2Y2NS00NjA3LWFjYTctMDRlMDI3NWE1OTUyXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg"
##
## $Ratings
## $Ratings[[1]]
## $Ratings[[1]]$Source
## [1] "Internet Movie Database"
##
## $Ratings[[1]]$Value
## [1] "8.0/10"
##
##
## $Ratings[[2]]
## $Ratings[[2]]$Source
## [1] "Rotten Tomatoes"
##
## $Ratings[[2]]$Value
## [1] "97%"
##
##
## $Ratings[[3]]
## $Ratings[[3]]$Source
## [1] "Metacritic"
##
## $Ratings[[3]]$Value
## [1] "92/100"
##
##
##
## $Metascore
## [1] "92"
##
## $imdbRating
## [1] "8.0"
##
## $imdbVotes
## [1] "233,030"
##
## $imdbID
## [1] "tt0075686"
##
## $Type
## [1] "movie"
##
## $DVD
## [1] "28 Apr 1998"
##
## $BoxOffice
## [1] "N/A"
##
## $Production
## [1] "United Artists"
##
## $Website
## [1] "N/A"
##
## $Response
## [1] "True"
The fact that httr converts the JSON response body automatically to an R list is very convenient.